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PREFACE 

The research which is the subject of this report was performed by personnel of 
the Lockheed Engineering and Management Services Company for the Supporting 
Research project of the Agriculture and Resour-es Inventory Surveys Through 
Aerospace Remote Sensing program. 

This program is underway within the Earth Observations Division, Space and 
Life Sciences Directorate, at the National Aeronautics and Space 
Administration, Lyndon B. Johnson Space Center, Houston, Texas. 
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1. INTRODUCTION 


The CLASSY program was developed to fit mixtures of multivariate normal dis- 
tributions to mul tichannel , multiacquisition spectral data sets. It thus 
serves simultaneously as a density estimator (providing an unconditional like- 
lihood for an observation) and a clustering algorithm (providing a conditional 
likelihood). Since it is anticipated that a particular segment will exhibit 
characteristic features in a sequence of Landsat acquisitions, it is further 
anticipated that CLASSY clusters will describe these features. 

A primary goal of the developers of the CLASSY program was to assist in the 
estimation of crop proportions in a Large Area Crop Inventory Experiment 
(LACIE) segment. This may be accomplished if the proportion of each crop of 
interest in a cluster can be estimated. If a small subset of the pixels can 
be labeled using some other procedure, the resulting training set will allow 
the estimation of the proportions by the maximum likelihood method. Such a 
procedure was developed on the Laboratory for Applications of Remote Sensing 
(LARS) system at Purdue University (ref. 1). As a first test, a training set 
of approximately 100 pixels for each of 10 segments was labeled from ground- 
truth data. The estimated proportion of small grains was compared to the 
proportion estimated from ground truth, with very encouraging results. 

For che present study, the training set was labeled by an analyst/i nterpreter 
(AI), and approximately 200 dots per segment were used. Thus, the experiment 
more nearly approximates applications conditions. Ten LACIE Transition Year 
segments were analyzed using the maximum likelihood labeling technique, and, 
using two different estimates, the proportion of small grains was compared to 
ground truth, to the standard Procedure 1 (PI) estimate, and to an estimate 
from a linear classifier trained on labeled dots. This linear classifier was 
part of the Statistical Analysis System (SAS) package (ref. 2). 
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Ten LACIE Transition Year segments were chosen under the constraints (1) that 
there should be some geographical variety, (2) that spectral data would be 
available at LARS, (3) that four acquisitions of good quality would be avail- 
able, and (4) that approximately 200 Al-labeled dots would be available for 
each acquisition. In all cases, the statistic of interest was the proportion 
of small grains grown in that year. The training dots had previously been 
labeled by AI's for the PI crop proportion estimates. As a result, PI esti- 
mates and ground-truth values of the statistic in question were available for 
comparison. For each of the acquisitions, the spectral data were first pro- 
jected onto the Kauth-Thomas greenness-brightness plane (ref. 3), and all 
analyses were conducted on this reduced data set. 

3. MAXIMUM LIKELIHOOD PROPORTION ESTIMATION AND CLUSTER LABELING PROCEDURE 

The following is a description of the maximum likelihood proportion estimation 
and cluster labeling procedure evaluated in this study. The purpose is to 
obtain estimates of the proportion of the class of interest (in this experi- 
ment, small grains) in each component distribution or cluster generated by the 
CLASSY program. 

Suppose that the CLASSY program is used to approximate the multivariate mix- 
ture density of the data. This will result in a set of multivariate normal 
distributions p(x|i), i = 1, •••, c, and a set of prior probabil ities , 
i = i, ••*, c. Now suppose that we have a set of data points x,, j = 1, •••, 

*J 

and a set of possible class labels 4^, £ * 1, ••*, M. Then, the joint proba- 
bility of observing data point xj associated with label # may be formulated 
as follows. 


o 
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Assume that 

pbjxj.i) = pU 4 IO = S ii (2) 

which implies that the label random variable $ is conditionally independent 
of the observation xj; i.e., given that one is sampling from distribution i, 
no further information is conveyed by knowing Xj . 

Using this model, we see that the proportion of class i may be estimated as 

PU) - £ a.B . (3) 

i=l 1 41 

and 3^ • may be interpreted as the proportion of distribution i that is 
composed of class ^ . 

Alternatively, each cluster may be labeled by selecting the class with the 
largest value of 3^ • . A proportion estimate may then be obtained as follows: 

pu 4 ) = E a. 

for all i such that B.. = max B-. (4) 

4 j 

The first proportion estimator will be called a stratified maximum likelihood 
estimator, and the second will be called a labeled cluster maximum likelihood 
proportion estimator. 

To estimate 3^., a maximum likelihood approach may be used, assuming that all 
a. and P ( x j | i ) are known. 

Given a random sample of labeled data points, the likelihood function is 


L = 


M N ! I 

n n p( x j . 

1*1 J 0 = 1 J t 


♦t) 


(5) 


where x . , j = 1 , ••• , N , are those data points labeled as coming from class $ 

J o ** ' 


J 
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Under the model, the likelihood function may be written as 


l = ri n £ °i ®ii p( x j m) 

4 = 1 J.-l 1=1 J 4 


( 6 ) 


Taking the log of the likelihood function and introducing the constraint that 
M 

^2 S c< = 1 for i = 1, •••, c, using Lagrangian multipliers, the function to 
4 = 1 

be maximized becomes 


M N 4 


f = IJ XJ HI ■ 5 n, (« 


S 4i " 1 


(7) 


Maximizing with respect to the 6 . results in a solution of 

k i 3 3 

given by 


3F 


4i 


S 4i 


>4i ' 


0, which is 


( 8 ) 
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4=1 


4i 


where 




4 i 


Jf=l fl a-B £i p(x. I 1 ) 
Li=l 1 J 4 J 


(4) 


Thus, the 0 £ . terms may be estimated using a fixed-point iteration scheme 
beginning with 


a 4i = ^ * 4 = 1,* • • ,M and i * i , • • • ,c 


( 10 ) 


4 


4. RESULTS 


The results of the experiment are given in table 1. 

a. Column 1 contains the LACIE segment number. 

b. Column 2 contains the percentage of small grains found in the segment, as 
computed from ground-truth information. 

c. Column 3 contains an estimate of the small-grain percentage obtained by 
clustering every other pixel in the segment using the CLASSY program. The 
maximum likelihood procedure described in the previous section was then 
used to estimate the proportion of small grains in each CLASSY cluster, 
using a set of approximately 200 AI labels *or training. A weighted aver- 
age small-grain percentage estimate was then attained by multiplying the 
size of each cluster (i.e., its prior probability) by its proportion of 
small grains and summing over clusters. See equation (3). 

d. Column 4 contains the proportion of the AI labels that were labeled small 
grains; i.e., a simple random sample of the approximately 200 training 

pi xel s . 

e. Column 5 contains an estimate of the proportion of small grains obtained 
by calling an entire CLASSY cluster small grains or nonsmall grains, if it 
had been estimated to contain a plurality of one category or the other 
using the maximum likelihood technique. Thus, the estimate is the sum of 
the prior probabi 1 ities of the small-grain clusters. See equation (4). 

f. Column 6 is the PI estimate of small grains actually obtained during the 
LACIE Transition Year processing of each segment. 

g. Column 7 is an estimate of the percentage of small grains obtained by 
deriving a Fisher linear classifier (using equal a priori probabilities) 
from the labeled training pixels. This classifier was then applied to 
every second pixel in the segment, and the proportion of those classified 
as small grains was computed. Thus, exactly the same data were used for 
this estimate as were used for the maximum likelihood cluster labeling 
procedure. 
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h. Column 3 is an estimate of the percentage of small grains obtained from 
the Fisher empirical Bayes linear classifier, where the crop a priori 
probabilities are assumed equal to the simple random sample estimate 
obtained from the AI dots. 

At the bottom of each column of estimates is the mean error in the percentage 
of small grains as observed from ground truth (the bias) and the mean squared 
error in the estimated percentage of small grains. 

5. CONCLUSION AND RECOMMENDATIONS 

The results presented in table 1 suggest that the Fisher linear classifier 
with equal a priori probabilities (column 7) was the best proportion estima- 
tion technique both in terms of bias and mean squared error. There seems to 
be no reason to expect this advantage from the linear classifier. Two pos- 
sible interpretations are presented: (1) Perhaps the linear classifier 

requires relatively fewer parameters to be estimated than either the maximum 
likelihood techniques or PI; th’s may have led to a stability which was 
expressed in the lower mean squared error. (2) Although AI labels introduced 
biases into the proportion estimates, these same labels were, on the average, 
fairly good at characterizing the distribution of small-grain and nonsmall- 
grain signatures . 

Thus, the Fisher classifier with equal a priori probabilities was able to 
effectively use the A I ' s abilities to characterize the signature of small 
grains and nonsmall grains. This conjecture is supported by the results 
obtained from the Fisher empirical Bayes linear classifier (column 8). This 
classifier is identical to the Fisher classifier used to obtain the results 
shown in column 7, except that the a priori probabilities of the crops were 
set equal to the simple random sample estimate obtained from the AI dots used 
to train the classifier. The bias and mean square error obtained for the 
Fisher empirical Bayes linear classifier are very similar to the results 
obtained from most of the other proportion estimation procedures. 
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The stratified maximum likelihood estimate and PI performed about equally in 
terns of mean squared error. This result is contrary to previous studies, 
which indicated a reduction in mean squared error for the stratified maximum 
likelihood estimation technique as compared to PI when both techniques used 
about 100 ground-truth-labeled pixels. It appears that errors present In the 
AI labels are sufficient to nullify any advantage of stratified maximum 
likelihood estimation over PI. 

Stratified maximum likelihood estimation exhibits an advantage over the simple 
random sampling method. This result is consistent with the previous results 
obtained using ground truth. 

Cluster labeling using maximum 1 ike' i ho i estimates of cluster purities was 
the least accurate procedure. This technique performed poorly because of the 
fact that clusters are not always pure and also because erroneous labels are 
often sufficient to change the label of a cluster from that which would have 
been obtained using ground truth. 

It is recommended that the maximum likelihood approach to proportion estima- 
tion using CLASSY cluster statistics be studied further, both to understand 
its potential and to pave the way for related but improved techniques for 
incorporating label information (particularly ,'.I label information) into the 
process of cluster proportion estimation and labeling. The significance of 
the essentially unbiased result obtained with the Fisher linear classifier 
should be investigated. This finding may have implications concerning the 
fundamental process of AI labeling. 
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